Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods
نویسندگان
چکیده
We present a framework to solve for best responses and equilibria in an extensive-form game (EFG) of imperfect information by transforming the game into a set of Markov decision processes (MDPs), and then applying simulation-based reinforcement learning to those MDPs. More specifically, we first transform a turn-taking partially observable Markov game (TT-POMG) into a set (one per player) of partially observable Markov decision processes (POMDPs), and we then transform that set of POMDPs into a corresponding set of Markov decision processes (MDPs). Next, we observe that EFGs are a special case of TT-POMGs, and hence can be transformed as described. Furthermore, because each transformation preserves the strategically-relevant information of the model to which it is applied, an optimal policy in one of the ensuing MDPs corresponds to a best response in the original EFG. We then go on to prove that our reinforcement learning algorithm finds a near-optimal policy (and therefore a near-best response in the original EFG) in finite time, although the sample complexity is lower bounded by a function with an exponential dependence on the horizon. Nonetheless, we apply this algorithm iteratively to search for equilibria in an EFG. When the iterative procedure converges, the resulting MDP policies comprise an approximate Bayes-Nash equilibrium. Although this procedure is not guaranteed to converge, it frequently did in numerical experiments with sequential auctions.
منابع مشابه
Solving for Best Responses in Extensive-Form Games using Reinforcement Learning Methods
We present a framework to solve for best responses in extensive-form games (EFGs) with imperfect information by transforming the games into Information-Set MDPs (ISMDPs), and then applying simulation-based reinforcement learning methods to the ISMDPs. We first show that, from the point of view of a single player, an EFG can be represented as an Information-Set POMDP (ISPOMDP) whose states corre...
متن کاملRegularized Best Responses and Reinforcement Learning in Games
We investigate a class of reinforcement learning dynamics in which each player plays a “regularized best response” to a score vector consisting of his actions’ cumulative payoffs. Regularized best responses are single-valued regularizations of ordinary best responses obtained by maximizing the difference between a player’s expected cumulative payoff and a (strongly) convex penalty term. In cont...
متن کاملLecture Notes on Game Theory
1. Extensive form games with perfect information 3 1.1. Chess 3 1.2. Definition of extensive form games with perfect information 4 1.3. The ultimatum game 5 1.4. Equilibria 5 1.5. The centipede game 6 1.6. Subgames and subgame perfect equilibria 6 1.7. Backward induction, Kuhn’s Theorem and a proof of Zermelo’s Theorem 7 2. Strategic form games 10 2.1. Definition 10 2.2. Nash equilibria 10 2.3....
متن کاملSolving extensive-form games with double-oracle methods
We investigate iterative algorithms for computing exact Nash equilibria in two-player zero-sum extensive-form games. The algorithms use an algorithmic framework of double-oracle methods. The main idea is to restrict the game by allowing the players to play only some of the strategies, and then iteratively solve this restricted game and exploit fast best-response algorithms to add additional str...
متن کاملJoint Learning in Stochastic Games: Playing Coordination Games Within Coalitions
Despite the progress in multiagent reinforcement learning via formalisms based on stochastic games, these have difficulties coping with a high number of agents due to the combinatorial explosion in the number of joint actions. One possible way to reduce the complexity of the problem is to let agents form groups of limited size so that the number of the joint actions is reduced. This paper inves...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015